Extracting Crime Information from Online Newspaper Articles

نویسندگان

  • Rexy Arulanandam
  • Bastin Tony Roy Savarimuthu
  • Maryam Purvis
چکیده

Information extraction is the task of extracting relevant information from unstructured data. This paper aims to ‘mine’ (or extract) crime information from online newspaper articles and make this information available to the public. Baring few, many countries that possess this information do not make them available to their citizens. So, this paper focuses on automatic extraction of public yet ‘hidden’ information available in newspaper articles and make it available to the general public. In order to demonstrate the feasibility of such an approach, this paper focuses on one type of crime, the theft crime. This work demonstrates how theft-related information can be extracted from newspaper articles from three different countries. The system employs Named Entity Recognition (NER) algorithms to identify locations in sentences. However, not all the locations reported in the article are crime locations. So, it employs Conditional Random Field (CRF), a machine learning approach to classify whether a sentence in an article is a crime location sentence or not. This work compares the performance of four different NERs in the context of identifying locations and their subsequent impact in classifying a sentence as a ‘crime location’ sentence. It investigates whether a CRF-based classifier model that is trained to identify crime locations from a set of articles can be used to identify articles from another newspaper in the same country (New Zealand). Also, it compares the accuracy of identifying crime location sentences using the developed model in newspapers from two other countries (Australia and India).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discoursal Analysis of Rhetorical Structure of an Online Iraqi English Newspaper

Abstract Rhetorical structure is helpful in improving how the writers maintain cohesion in their writings. This study examines how the Iraqi writers maintain cohesion in the text by analyzing the various rhetorical moves in Azzaman, an online Iraqi newspaper. To this purpose, twelve opinion articles from Azzaman Iraqi newspaper, published from January 2013 to June 2013 were analyzed. The findin...

متن کامل

Discoursal Analysis of Rhetorical Structure of an Online Iraqi English Newspaper

Abstract Rhetorical structure is helpful in improving how the writers maintain cohesion in their writings. This study examines how the Iraqi writers maintain cohesion in the text by analyzing the various rhetorical moves in Azzaman, an online Iraqi newspaper. To this purpose, twelve opinion articles from Azzaman Iraqi newspaper, published from January 2013 to June 2013 were analyzed. The findin...

متن کامل

Thematic Progression in the Rhetorical Sections of an Online Iraqi English Newspaper

Abstract Thematic development refers to the way theme and rheme in the clause are developed. The theory of rhetorical structure can be defined as the strategies that follow specific ways to make writing more persuasive. The present study aimed to examine how Iraqi writers maintain cohesion in the text by analyzing the patterns of thematic progression in various rhetorical sections in an online ...

متن کامل

Thematic Progression in the Rhetorical Sections of an Online Iraqi English Newspaper

Abstract Thematic development refers to the way theme and rheme in the clause are developed. The theory of rhetorical structure can be defined as the strategies that follow specific ways to make writing more persuasive. The present study aimed to examine how Iraqi writers maintain cohesion in the text by analyzing the patterns of thematic progression in various rhetorical sections in an online ...

متن کامل

Automatic Extraction of Event Information from Newspaper Articles and Web Pages

In this paper, we propose a method for extracting travelrelated event information, such as an event name or a schedule from automatically identified newspaper articles, in which particular events are mentioned. We analyze news corpora using our method, extracting venue names from them. We then find web pages that refer to event schedules for these venues. To confirm the effectiveness of our met...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014